Optimal incorporation of sparsity information by 

weighted t\ optimization 



Toshiyuki Tanaka Jack Raymond 

Graduate School of Informatics Department of Physics 

University of Kyoto Hong Kong University of Science and Technology 
Kyoto, Japan Hong Kong, China 

Email: tt@i. kyoto-u.ac.jp Email: jackraymond@physics.org 



Abstract — Compressed sensing of sparse sources can be im- 
proved by incorporating prior knowledge of the source. In this 
paper we demonstrate a method for optimal selection of weights 
in weighted l\ norm minimization for a noiseless reconstruction 
model, and show the improvements in compression that can be 
achieved. 

I. Introduction 

Compressed sensing research has the aim of improving the 
efficiency and reliability of source estimation when the number 
of measurements taken is smaller than the number of degrees 
of freedom in the source. Although somewhat counterintuitive, 
it turns out that, given some special structure in the source, it 
is possible to reconstruct accurately all source components (T). 

Owing to the fundamental nature of the problem, it is not 
surprising that research and methodology span a number of 
fields such as image processing, topology, multi-user detection, 
and convex optimization |2j, 0, iHJ, 0, (6]. In this article 
we also apply methods developed initially within physics, that 
have recently been applied to compressed sensing [7], [8|, Q. 

In the canonical model, a real-valued vector y constitutes 
observations of a real-valued source vector x° via a measure- 
ment matrix A as 

y = Ax° . (l) 

The dimension of y is M and the dimension of x° is 
N(> M). The parameter a = M/N quantifies the number 
of observations per the degree of freedom of the source, and 
we call it the compression rate. The problem of compressed 
sensing is to reconstruct the source x° from the observations 
y and the measurement matrix A. 

When the compression rate a is smaller than one, the 
problem amounts to solving an ill-conditioned set of linear 
equations and hence its solution is not unique. However, if 
the source is suitably structured, then such structural infor- 
mation may be utilized to overcome the ill-conditionedness. 
One example is sparsity, whereby the source has many zero 
components. For reconstruction of sparse sources it is now 
well known that provided a is not too small an accurate re- 
construction of the source x° may be achieved by minimizing 
the i\ norm 

£>i| (2) 
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subject to the constraint y = Ax. This minimization problem 
can be solved efficiently by casting it into a problem of linear 
programming. It has recently been shown that l\ minimization 
also allows perfect reconstruction of sparse sources j4|, Q 
with high probability for large systems provided that the 
compression rate a is larger than a certain threshold value a c . 
Furthermore a sharp transition between perfect and imperfect 
reconstruction is observed at a = a c , which we call the 
compression threshold. 

We shall consider weighted norms, in which each compo- 
nent in the sum (O has a different coefficient (weight). The 
consideration of non-uniform weights has been motivated by 
Bayesian and topological arguments, algorithmic studies and 
much empirical evidence. Numerous results indicate that use 
of non-uniform weights can reduce the compression threshold, 
guarantee optimality in a probabilistic sense, or otherwise 
improve algorithmic performance iflOl . ifTTIl . |fl~2), Hl3l . A 
topology based method able to place bounds on weighted 
reconstruction was recently proposed, and demonstrated for 
the case of a source with two blocks of distinct density of 
non-zero components (3). 

II. Problem 
A. Reconstruction with marginal density information 

The problem we address is the following: given a mean 
density of non-zero components, and some additional prior 
information on the distribution of non-zero components, how 
can the weights be chosen so as to minimize the number of 
measurements required to reliably reconstruct the source using 
weighted l\ norm (w-£i) minimization. 

We assume a standard model framework in which compo- 
nents of A are independent and identically-distributed random 
variables; drawn from a symmetric distribution of finite vari- 
ance and higher-order moments without finite probability mass 
at the origin, the Gaussian distribution being one intuitive case. 
The observations y are also random variables, determined as 
functions of the measurement matrix and a random source 
through (Q])- 

In the standard sparse problem a fraction < p < 1 of all 
source components are non-zero, this subset being selected 
uniformly at random. It is instead assumed in this article 
that a (potentially distinct) marginal probability is known for 



every component in the source, so that is non-zero with 
probability < pi < 1 independently of other components. 
We call the parameter pi the density of component i. The prior 
is therefore 

P(x ) = U[p i S(x i ) + (l-p i )8(x° i )] , (3) 

i=l 

where S(x®) is the conditional prior of x® given that it is 
non-zero. The conditional prior S(x®) is assumed unknown 
to the observer, but it should have finite moments and should 
not have finite probability weight at x® = 0. The densities 
{pi} encode prior information about sparsity of the source, are 
assumed available to the observer, and will be called marginal 
density information. This definition covers the case of block- 
structured sources, wherein a source is divided into sets of 
components according to their densities, including two blocks 
as a special case. 

To each source component is also associated a weight Wi > 
0. The estimate of the source is obtained as a solution to an 
optimization problem: minimization of the weighted l\ norm 
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subject to 



Ax 
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We note irrespective of weighting a trivial lower bound on 
the compression required for perfect reconstruction: a c > p. 
The lower bound is easily achieved, by a matrix inverse 
operation, when the positions of all zeros in the source are 
known - or equivalently all the densities pi are extremal, 
or 1. Any variation from these extremal values for fixed p 
increases the uncertainty in the source values, and so will 
increase the compression threshold. 

The principal results in this article are regarding: 

(1) The compression threshold a c for perfect reconstruction, 
with high probability, given a set of weights; 

(2) The assignment of weights, as a function of the marginal 
densities, that achieves a minimal a c , and the value of 
this minimum. 

III. Analysis for w-£i minimization 

A. Overview 

Rather than directly finding and classifying minima of the 
w-£i minimization problem, an auxiliary probability distribu- 
tion may be introduced, for which the maximum a posteriori 
(MAP) solutions are equivalent to the solutions of the opti- 
mization problem. A parameter f3 is introduced to soften the 
probability distribution and to allow application of analytic 
methods, and this is finally taken to be large so as to recover 
the posterior mean coincident with the MAP solution as well 
as the solution of the w-£i minimization. The probability is 
the model posterior 



concentrated on values x that solve the weighted minimization 
problem We further define a generating function in the 
large-system limit and its expectation value with respect to 
measurements 

f = f(f3,N) = ~iog Z, f = lim lim E Ay [/] . 

piv p^-oo N—too 

(6) 

The function f(/3,N) is linearly related to a generalized 
version of the mutual information between the source and the 
observations 03), lfT31 . 

Statistics on w-£\ solutions are extracted from /, or /, by 
adding small perturbations to the model posterior and taking 
derivatives. Consider adding a small perturbation — /3h\\x — 
x°||2 to the exponent of (0, whereby the normalization Z 
now depends on h as well. Here, || • H2 denotes the £2 norm. 
The normalized mean square error (MSE) with respect to the 
model posterior Q is given by 



A r_1 E[ 



(7) 



which in the limit (3 —> 00 gives the normalized MSE of the 
w-£i solution. If MSE is evaluated to be zero, this implies that 
the w-£i minimization is sufficient for perfect reconstruction. 

An analysis of / is problematic owing to its definition in 
terms of random measurement parameters, but we mitigate for 
this by studying the asymptotics. Many interesting statistics, 
such as MSE, are found to be self-averaging - the statistics 
derived from / for almost all y and A converge to those 
derived from / for large N, We thus analyze the expectation 
value / in place of the random object /, expressing the former 
in an analytic form at leading orders in N and j3, thereby 
demonstrating conditions for perfect reconstruction. 

In our analysis we assume, besides the self-averaging 
property, the validity of exchanging order of limits. We will 
solve the problem by the replica method, which introduces 
further heuristic assumptions - the replica symmetric (RS) 
solution is developed. We are unable to verify rigorously each 
assumption, but the methodology is a standard one Q, O, 
191 , and a catalogue of successes indicates the results herein 
should be treated as exact. Some closely related results in 
compressed sensing [7| are already known to coincide with 
those of rigorous methodologies HJ. 

B. The replica method 

The expectation with respect to the random variables © is 
taken by a standard approach. The generating function may 
finally be expressed in an extremization form, as 



/ = Extr. 
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P(x|y, A) = i<5(Ax-y)exp^- i 9^«; i |x i |^ (5) + Pl S{x )wle((^/iX + mx°)/ Wl ,qf) DXdx ^ } , (8) 



where Z is the normalization for the probability distribution. 
In the limit of large /3 the probability measure will be 



where DX = (2tt) 1 l 2 e A l 2 dX is a standard Gaussian mea- 
sure, where E{ P4) „, i }[- ■•] = ^V _1 Y^i=\{' ' ' ) is an empirical 



average with respect to the densities and weights of source 
components, and where the function e(-, •) is defined as 



e(a,b) = -■ 
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(9) 



A solution is the extremum where the partial derivatives with 
respect to the six order parameters are zero, and the set of self- 
consistent equations describing the order parameters are called 
the saddle-point equations. Note that for the special case of 
uniform weights and Gaussian S(x®), $Q becomes identical to 
results derived in the absence of marginal prior knowledge Q. 

The order parameters for the correct solution of dHJ co- 
incide with informative quantities: q is the mean-squared 
estimate TV -1 E[||x|||]; m is the overlap of source and estimate 
A r ~ 1 E[x • x ]; and \ is a statistic on the pair correlation 
functions. The mean square error is 7V _1 E[||x — x !^] = 
p — 2m + q. 

The six saddle-point equations are easily reduced to three 
equations on the conjugate parameters (denoted by hat) rh, q 
and \, and further to two equations by identifying q = rh. 
Given a specific parameterization of the weights and mea- 
surement process these might be solved numerically. However, 
given our noiseless model (fTJ it is known that the mean square 
error will be zero in some range of large a, and reasonable 
weights. This perfect reconstruction solution was found in the 
unweighted case with 1/m = Q, and generalizes to the 
weighted case. 

We can solve the equations to leading order in 1/rh. At 
leading order many features of S(x®), the prior on non-zero 
components, are inconsequential. It is convenient to define two 
functions of x 

g 1 (x) = P + E[(l- Pl )2Q(u l )}, (10) 
where ui — Wi / \fx is a rescaled weight, and 

2exp(- M 2/2) 



92{x) = P + E[ Pl (u l ) 2 }- 
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+ E[(l- j o J )(l + (^) 2 )2Q(w l )] , 



(11) 



where Q(z) = DX is the conventional Q-function. The 
two saddle-point equations may be expressed at leading order 

as 



and 



X = x-. 
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where the latter equation can be solved numerically. 



(12) 
(13) 



C. Stability criteria 

For the RS solution of the replica method a local stability 
analysis gives necessary criteria for the validity of any solu- 
tions found, and can be studied by three eigenvalues derived 
from the full, rather than RS ©, saddle-point form |16|. 
Two eigenvalues relate to instabilities consistent with the RS 
saddle-point equation jSJ, and can be evaluated within the RS 



framework, by analysis of (fT2l) and dT3l . The final 'replicon' 
eigenvalue can indicate a failure of the RS assumption yielding 
©, but cannot be derived from ([HJ. When the RS assumption 
fails in this sense, a symmetry breaking approximation should 
be required. 

Stability within the RS framework requires that a small fluc- 
tuation in the order parameters of the right-hand side of (TT~2b 
and ( TT3l decays to zero under iteration of the equations. Small 
fluctuations in x and m 1 about the perfect reconstruction 
solution are stable provided that 



a > 9i(x) ■ 



(14) 



It can be seen that for large a and small p these conditions 
will be met. The replicon eigenvalue is consistent with the 
stability of the solution. 

IV. Main results 

A. Threshold equations 

Stability and uniqueness of the perfect reconstruction so- 
lution implies success of w-£i reconstruction up to errors 
of order 1/N, with high probability. The instability criterion 
is met as a is decreased from one so that there exists a 
compression threshold ot c < 1. A pair of equations describe 
the threshold in terms of only one order parameter x 

«c=.9i(x) and g 3 (x) = 92(x) ~ 9i{x) = • (15) 

The latter equality is a consequence of (U~3b combined with 
(fl4l . and is independent of a c . Aside from the explicit 
dependence on x, there is a dependence on the densities and 
weights. The two equations ( fl5l l provide implicit expressions 
for the dependences of the compression threshold a c on the 
set {pi, Wi} of densities and weights. 

If one assumes uniform weights, the compression threshold 
a c is described as an implicit function of p by 



a- 



exp 



1 - 2Q 
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and 



1-p 



(16) 



(17) 



For an inhomogeneous system of fixed p, any knowledge 
of the marginal densities constitutes additional information. 
Incorporation of non-uniform weights introduces additional 
degrees of freedom in the model that can be exploited to 
reduce a c . It also means that it is necessary to specify the 
weights according to the given set of densities in order to find 
the optimum value of a c . 

B. Optimization of a c by weight selection 

In order to select optimally the weights, we consider the 
optimization problem of minimizing the compression thresh- 
old with respect to the weights. For this purpose we can take 
X and {pi} to be fixed parameters and minimize a c = g\ with 
respect to the weights {w;} subject to the constraint that g% 



is zero (see (fT5ll). The constraint can be dealt with via the 
Lagrange multiplier method as 

min (gt - Xg 3 ) . (18) 
The derivative with respect to Wi leads to the following criteria 

Ul exp (u 2 J2) (-£*- + 2Q(u l )) = , (19) 

\ 1 — Pi J Ay ATT 

combined with the derivative with respect to A, which reads 




= 0, (20) 



yielding A = 1. Therefore the set of equations ( fT9l , each of 
which determines one weight, are independent given \ an d Pi- 

Using this independence it is possible to optimally set 
the weights for an arbitrary density distribution {pi} by the 
following procedure. Solve (fT9l for m, and the solution is 
a function of pi only. The set of solutions {ui} defines the 
optimal weights up to the overall scaling. Observing that the 
w-£ i minimization problems are invariant under the overall 
scaling, every weight can be assigned straightforwardly as 
Wi — Ui. This set of weights will minimize a c . 

To evaluate the minimum of a c , one observes from (TTOb 
and (TT3T > that a c = E[a{pi)] holds, where a(pi) = pi + 
(1 — pi)2Q(Ui(pi)) . This expression can be understood as 
if each component of the source, with density p, would 
require a compression rate at least a(p) for its perfect re- 
construction, and the lower bound of the total compression 
rate is the empirical average of this componentwise bound 
over all components. This interpretation would further suggest 
a possibly fundamental importance of the quantity a(p) as 
a measure of some sort of information associated with a 
random variable that takes non-zero values with probability 
p. One can also confirm, by particularizing our results to the 
unweighted system, that a(p) gives the compression threshold 
of the unweighted system with density p. Thus, evaluation of 
a c for the weighted system only requires the densities {pi} of 
the system and the curve a(p) for the unweighted system. 
Moreover, the curve for the unweighted system is convex 
upward, so that Jensen's inequality tells us that a c is below 
ct(p), except when the density distribution is concentrated at 
a single point, implying that the vi-t i minimization improves 
the compression threshold over the unweighted counterpart. 

It should also be noted that, when a block structure can 
be assumed for the source, the minimum a c is equal to 
the compression rate that would be achievable if each block 
were measured separately with the optimum compression 
rate for that block and then reconstructed individually from 
other blocks. This observation implies that introduction of 
optimal weights successfully compensates degradation due to 
intermixing of blocks of different densities in measurements. 

Finally we can note an interesting asymptotic in the result 
applicable in the case that p is close to one (\ small). In this 
case the optimal weights are assigned according to a simplified 
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Fig. 1, (color online) Compression threshold a c as a function of density 
{0.3 < p < 0.7, 5p = 0.3} for various Sw. Four curves are highlighted 
from a range < Sw < 0.99. The lower hull of all curves indicates the 
minimum a c achievable by optimal weight selection. Inset: The hull of all 
curves (corresponding to the maximum compression) can be constructed from 
the unweighted (Sw = 0) curve by a simple linear construction. 

form of ( fl9] l 

/2 exp(-u?/2) 
Pi/(1-Pi)~\ — -• (21) 

V 7T uf 
V. A TWO-BLOCK EXAMPLE 

A. Setting 

As a simple case we consider the source consisting of two 
equally-sized blocks labeled b = ±1, such that components 
in the block —1 are non-zero with some uniform lower 
probability than components of block +1. A corresponding 
asymmetry in the weights can be assumed, but weights must 
also be uniform within any block. This case allows a deal of 
intuition and has been studied previously owing to its simple 
structure Q. 

B. The large-system limit 

Let the density and the weight of block b = ±1 be 
Pb = p + b5p and Wb = 1 — bSw, respectively, with 
Sw, 5p > 0. Figure Q] demonstrates the result for a variety of 
weight asymmetry Sw given a density asymmetry 5p = 0.3. 
Since the density in each block must be in (0, 1) the relevant 
values of p are confined to the interval (0.3, 0.7). Curves for 
< Sw < 0.99 are plotted, with four labeled cases, the 
uppermost curve being the unweighted case. It can be seen that 
the compression threshold is reduced by allowing asymmetric 
weights, and that the lower envelope of all curves indicates 
the achievable performance by optimal selection amongst the 
weights. For the system (p+i, P-i) = (0.8, 0.2) the midpoint 
of the line connecting (p+i, a{p + \)) and (p-%, a(p-i)) co- 
incides with the lower envelope at p = 0.5 corresponding to 
the optimal compression rate for the two-block model. 
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Fig. 2. (color online) Compression threshold a c for unweighted (8w = 0) 
versus optimally weighted (8w = 0.684) reconstruction for a two-block 
model (p+i = 0.8, p— i = 0.2) for various system sizes. Each symbol 
represents the mean threshold from 10 6 trials each with independently 
generated measurements, error bars are small by comparison with symbol 
size. The data is fitted by x 2 regression. 

C. Numerical verification 

To test the theory we have generated a number of instances 
of the problem for various system sizes and solved these by 
linear programming. Components of A and non-zero compo- 
nents of x° were sampled from normal distributions. Optimally 
weighted, and unweighted, l\ minimization problems were 
solved to determine the compression threshold where optimal 
reconstruction failed. Figure |2] demonstrates result for a two- 
block system with a uniform weighting w+i = W-i and 
optimized weighting. It can be seen that the thresholds are 
distinct with a significant improvement in performance for the 
optimally weighted case. 

Results of fitting these numerical data with second-order 
polynomials in 1/N, obtained by % 2 regression, are also 
plotted in Fig.Q] Extrapolation to 1/N yields an estimate 
of the compression threshold a c — 0.742 73 ± 0.000 06 for 
the optimally weighted case, which is in agreement with the 
analytical result a c — 0.742 72. The extrapolated result for the 
unweighted case a c — 0.831 32±0.000 06 is also in agreement 
with the analytical result a c — 0.831 30. 

VI. Conclusion 

This paper has demonstrated a method for optimal selection 
of weights in the w-£i minimization utilizing prior knowledge 
about densities, and thereby providing the optimal threshold 
for compression. The result is a simple one with decoupling 
structure for source components, and described by a single 
order parameter in the case of perfect reconstruction. The 
threshold in the compression rate for which perfect reconstruc- 
tion is possible in a system of known marginal densities can 
be straightforwardly derived from the threshold curve for the 
unweighted system with a simple graphical procedure. This 
work should in future be extended to consider the effect of 
noise and of correlations between the source components. 



The method relied on the replica method and a saddle-point 
formulation, which although complicated in origin provides a 
concise and intuitive saddle-point framework from which to 
derive results. 

The analysis presented has been verified in experiment and 
provides a mechanism that may be immediately incorporated 
in practical problems where marginal density information is 
available. A complete description of the replica analysis and 
various extensions will be forthcoming in an article under 
preparation. 
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